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Abstract 

In this work, an algorithm has been proposed for real time 
detection and classification of beluga whale calls. The 
detection algorithm is based on an adaptive activity detector 
that exploits a priori knowledge of the longest/ shortest 
beluga whale sound unit. Optimum parameter values of the 
proposed detector are obtained by simulation to maximize 
the difference between Detection Probability and False 
Alarm Probability. A set of features that allow successful 
classification by means of a Naive Bayes classification 
algorithm has been put forward as well. Three classification 
categories related to observed beluga behaviours were 
selected. In a different perspective, the proposed features 
can be employed to obtain clues of how beluga sounds are 
produced and the degree of control that the specie has over 
its internal organs. As an example, the presence of 
nonlinearities such as subharmonics and frequency jumps 
have been detected and related to some extracted features. 
This technique can serve as a complement to more rigorous 
studies based on video information and ultrasonic sensors 
for whale monitoring. 
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Introduction 

In 1971, Payne and McVay defined the structure of 
humpback whale songs as themes that are repeated in 
specific patterns; and the building blocks are termed 
units (the shortest continuous sound between two 
silences) [15]. A relatively recent study shows that the 
vocalization patterns might be a useful tool to measure 
" animal welfare" in belugas (Delphinapterusleucas). In 
his work, Castellote demonstrates that during stress 
periods, such as that produced by veterinary 
manipulation or air transportation to new facilities, the 
vocalization rate and patterns change [12]. Other 
studies suggest that automatic systems for 


continuously monitoring communicative beluga 
vocalizations could become an important tool for the 
research on animal behaviour and animal care [11]. 
The implementation of these detection systems is not 
an easy task and has never been done in belugas. The 
rich vocabulary of beluga whales, the complexity of 
their songs and the presence of interferences from 
different sources make it difficult to design automatic 
detectors. 

The algorithms typically employed are not prepared to 
work in real time. Most of the proposed detectors are 
based on the computation of some kind of time- 
frequency representation. After that, the vocalization 
patterns are searched by means of different signal 
processing techniques: spectrogram correlation 

detector (XBAT) [3, 5], cross correlation with a 
matched filter kernel [3, 9], neural networks [3], etc. 
Although these approaches give quite accurate results, 
they are developed as research laboratory tools rather 
than for designing continuous monitoring systems. 
Additionally, the aforementioned algorithms need 
high computation capacity processors. 

This work proposes an alternative to the problem of 
detection and classification of beluga whale 
vocalizations based on a reduced set of extracted 
features. This technique allows real time processing of 
the audio registers and designing automatic systems. 

The remainder of this paper is structured as follows. In 
the second section, we present an adaptive activity 
detector and the selected features for classification. In 
the third section, we validate the proposed system in a 
controlled environment. In the forth section, we give 
some examples using the presented features to 
automatically detect the presence of nonlinearities. 
Finally, we present our conclusions and future work. 

The research was carried out in the Oceanografic 
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facilities (Ciudad de las Artes y las Ciencias de 
Valencia) as part of a collaborative research project 
between the Institute of Telecommunications and 
Multimedia Applications and the Oceanografic 
biologists. 

The Proposed Approach 

The proposed approach to the problem is 
schematically described in Fig. 1. The system is 
composed of two main blocks: an adaptive activity 
detector and a feature-based classifier. Before 
explaining in detail different parts of the proposed 
algorithm, some assumptions on signal statistics 
should be done. It is assumed that the signal acquired 
by the hydrophone can be modeled as the sum of two 
stochastic processes: a quasi-stationary process that 
models the ambient noise and a quasi-stationary 
process that models the whale vocalization. These 
assumptions have been made after studying a large 
number of records provided by the Oceanografic 
researchers. They are quite accurate if the analysis 
window is short enough. 

Activity detector 

High pass filter fit | | 

I 1 No . J x(») 

Yes 

Feature extraction to classify 

Type 1 vocalization 
Type 2 vocalization 

Type 3 vocalization Feature extraction 

I 

Vocalization=[v, , v jt v jt v 4 , v } , ... , v /4 ] 

f 

Classification and Statistics | 

FIG. 1 PROPOSED AUTOMATIC DETECTOR FOR BELUGA 
WHALES VOCALIZATIONS 

Adaptive Activity Detector 

The first stage of the algorithm consists of a Finite 
Impulse Response digital high pass filter (1000 Hz 
cutoff frequency). This filter removes noise 
components out of the beluga frequency range and is 
proved to be an effective tool to remove low frequency 
interference sounds produced by other species that 
can be found at the Oceanografic facilities. 

After this high pass filter, we employ an activity 
detector algorithm fine tuned for beluga whale calls. 
The proposed activity detector is similar to many long- 
established detectors applied in robust detection of 


speech [8]. However, we have added some 
modifications to these detectors that take profit of the 
beluga whale songs structure. 

Let us assume that the stochastic processes that model 
the ambient noise and beluga whale song picked up 
by the hydrophone are zero mean Gaussian processes 
of given variance. If noise and signal (beluga sounds) 
were zero mean white Gaussian stationary processes 
of different variances, the Neyman Pearson detector 
would be the energy detector [17]. Unfortunately, the 
noise component resembles more closely a quasi- 
stationary process than a stationary one. Noise 
variance changes slowly with time due to the ambient 
noise, for instance. 

Using a fixed threshold energy detector will lead to a 
large amount of detections if the threshold is too low, 
or to miss sound units if the threshold is too high. In 
this work, we propose an energy detector with an 
adaptive threshold that exploits some a priori 
knowledge of the longest/ shortest communicative 
sense beluga sound unit. 

The algorithm is detailed as follows. If we call x(n) the 
stochastic process at the output of the highpass filter, 
then a hypothesis test is that: 

H 0 : x(ri) = w(n) n = 0, 1, 2, ...,N — 1 

H 1 : x(ji) = s(ji) + w(n) n = 0, 1, 2, ...,N — 1 

where w(n ) is the underlaying Gaussian noise and 
s(n ) is the beluga vocalization. The value N is related 
to the minimum building blocks length of the beluga 
whale songs. If N is small enough, the stochastic 
processes can be considered stationary in this time 
interval, consequently, equations given in [17] are 
valid and the Neyman Pearson detector decides H 1 if 

N - 1 

T(x) = ^ x(n) 2 > y' (1). 

71=0 

The threshold y can be obtained by means of Eq. (2) 

y =o 2 -q- x Upfa np ) ( 2 ), 

X N 

where Q~l is the inverse chi-square distribution with 
N degrees of freedom. The value o 2 is the noise 
variance at the output of the filter and PFA np is the 
desired false alarm probability of the Neyman Pearson 
detector. 

If the threshold y is maintained fixed with time, the 
conventional energy detector is obtained which is the 
optimum detector for stationary processes. However, 
as we have previously stated that noise process is a 
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quasi-stationary process. Let us call x { the i — th N 
sample fragment of x(n ) . The proposed adaptive 
threshold energy detector works in blocks of N 
samples (Xi) and the threshold]/ is recalculated from 
block to block according to the algorithm detailed in 
Fig. 2. 

In the proposed algorithm, M is the length of the 
longest beluga vocalization with communicative sense 
and N is the length of the shortest beluga 
vocalization, of is the variance of the i — th fragment of 
acquired signal (operator Var [ •] is used to calculate the 
variance) and A a controls how fast the adaptive 
algorithm adapts to changes in the noise profile. 

1. a = 1, i 0 = 1, i = 1 and of is initialized to noise variance. 

2. If the number of consecutive detections is longer than M then a 

= a + Aa and make i = i 0 . 

3. Read block x { and calculate of = Var[xf\. 

4. If of < (a + of) then of= of. 

5. Energy detector of block x t according to Eq. (1) and Eq. (2) with 

of. 

• If no detection a = 1, advance to the next block i = i + 1. 

• If detection: 

o If there is no detection in the previous segment, 
then make i 0 = i, save detection to classify stage 
and advance to the next block (i = i + 1). 
o If there is detection in the previous segment, then 
save detection to classify stage and advance to the 
next block (i = i + 1). 

6. Go to step 2 (until end of x(n)) 

FIG. 2 ADAPTIVE THRESHOLD ACTIVITY DETECTOR 
PROPOSED 


proposed adaptive threshold algorithm. In this 
simulation, the noise variance changes slowly 
compared to variance of beluga vocalizations. 
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FIG. 3 DETECTION PROBABILITY (PD), FALSEALARM 
PROBABILITY (PFA) AND DIFFERENCE (PD-PFA) FOR THE 
PROPOSED ALGORITHM IN FUNCTION OFAa, ( PFA np = 1(H) 
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It is important to control the magnitude of the 
parameter A a. If this value is very small, the algorithm 
follows noise variations very precisely producing a 
large number of false alarms when noise power 
changes. Thus, the convergence process becomes slow. 
On the other hand, if A a is very high, the algorithm 
follows noise changes in larger steps and the detection 
probability of small amplitude vocalizations decreases. 
The best value for A a has been obtained through 3000 
Monte Carlo simulations of random vocalization 
patterns for different A a values. 

Results are shown in Fig. 3 where it can be seen that 
Aa = 2.1 maximizes the difference PD-PFA (blue 
curve). 

Fig.4 compares (in a time varying noise scenario) a 
conventional fixed threshold energy detector with the 


FIG. 4 COMPARISON OF THE PROPOSED ADAPTIVE 
THRESHOLD TO CONSTANT THRESHOLD (M = 3 • N). BLUE 
CURVE: SIMULATED SIGNAL, RED DASHED CURVE: 
CONSTANT THRESHOLD AND GREEN DASHED CURVE: 
PROPOSED ADAPTIVE THRESHOLD. THE CHARACTER 
INDICATES WHERE THERE ARE DETECTIONS 

Features Description and Classifier Details 

Once beluga vocalizations are detected, the obtained 
vocalization units must be classified. The possible 
classification categories related to observed animal 
behaviour were provided by the Oceanografic 
biologists based on previous works [12]. In order to 
enhance previous detection process, an additional 
category "noise" is added to the original list. The 
objective of this category is to remove possible noise 
events that resemble beluga vocalizations that might 
have been detected by proposed activity detector. 
With this category we can remove interfering sounds 
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such as sounds from other species or human produced 
noise. 


TABLE 1 CLASSIFICATION CATEGORIES EMPLOYED WITH SOME SOUNDS 
EXAMPLES FROM THEIR REPERTOIRE AND A BRIEF DESCRIPTION RELATED 
TO ANIMAL BEHAVIOUR 


Category 

Some examples 

Brief description 

Tonal 
(Fig. 5a) 

-Single tonal 
-Multitonal 
-Up-sweep whistle 
-Down-sweep 
whistle 

Tonal vocalizations are 
typically associated to 
communicative 
behaviour 

Pulsed 
(Fig. 5b) 

-Pulsed train 
-E cholo caliz ation 

Trains of pulses with 
communicative or 
aggressive component. 
As well a secholocation 
functions 

Jawclap 
(Fig. 5c) 

-Jaw claps 

Impulsive jaw claps 
generally aggressive 
sounds 

Noise 

-Underwater 
ambient noise 

Noise category to 
remove incorrect 
detection events 


The categories and its main characteristics are 
summarized in the Table 1. Fig. 5 shows a time 
frequency representation sample of a sound unit 
belonging to each one of the categories. 

A relatively small number of features have been 
chosen to best describe and distinguish each 
vocalization category. Since time-frequency 
representation is not needed to calculate the different 
features, real time processing is easily achieved with 
low cost processors. 

The Table 2 shows the whole set of parameters and 
their corresponding label number. These features have 
been selected to adjust to different categories in order 
to maximize classification rate. 

As it has been previously stated, some features have 
been selected as a simple and low computational 


complexity approach to the shape of the time 
frequency representation. For instance, the features 
\ v i v 9 \ summarize the resonant frequencies and 
bandwidths of beluga multitonal vocalizations using 
the Fourier transform instead of the spectrogram. 


TABLE 2 Feature set employed in the automatic detector 


Feature 

Number 

Short Description 

Vi 

Fundamental Frecuency / 0 

^2 

Q-factor of f 0 = A f 0 /f 0 

^3 

Power spectral density of the frecuency S x (/ 0 ) 

V 4 

First Harmonic 

V 5 

Q-factor of f x = hfi/ fi 

^6 

Power spectral density of the frecuency S x (A) 

P 7 

Second Harmonic f 2 

^8 

Q-factor of f 2 = A f 2 /f 2 

V 9 

Power spectral density of the frequency S x (/ 2 ) 

Vio 

Skewness of the vocalization 

Vu 

Kurtosis of the vocalization 

V 12 

Autocovariance test of vocalization 

V 13 

Time reversibility measure of the vocalization 

V U 

Voiced/ Unvoiced measure 


Fig. 6 illustrates each feature definition, where / 0 ,/i 
and f 2 are the fundamental frequency, first and second 
harmonic, respectively. The -3 dB bandwidths A f 0 , 
A/i and A f 2 are employed to obtain the normalized 
bandwidth or Q-factor (features v 2 , v 5 and v 8 ) 
according to equations given in Table 2. Power 
spectral amplitude of the three main frequencies is 
also obtained (features v 3 , v 6 and v 9 ). 




FIG. 5 TIME FREQUENCY REPRESENTATIONS OF: A) PURE TONAL VOCALIZATION (MULTITONAL), B) PURE PULSED VOCALIZATION 

AND C) JAWCLAP VOCALIZATION 



64 




Advances in Applied Acoustics (AIAA) Volume 2 Issue 2, May 2013 


www.aiaa-journal.org 


S(0' 

S A> 

S(f) 

x' V 

S(f,) 

x x 2 ' 


’SI. 

/ll 

Afj 


71 


. . 


Af,\ 


Af 


f[Hz] 


when both forwards and backwards in time are 
examined. This feature is computed as: 

N 

1 V 1 / x(n ) - x(n - 1) 

V « = ^L{ 7^ 

n = 1 V 

For times series that exhibit time-reversibility, it is 
expected v 13 ~ 0. In contrast, processes that are time 
irreversible yield values of v 13 > 0 or v 13 < 0. Both 
parameters, v 12 and v 13 , are typically employed to 
detect nonlinearities in time series [14]. 



FIG. 6 DEFINITION OF FREQUENCY RELATED FEATURES {v x TO 

v g ) 


Other parameters are related to higher order statistics 
of the vocalization (features v 10/ v llr v 12 andy 13 ). The 
preliminary study carried out by the researchers 
showed that this statistical information could be useful 
to identify some particular sound units. The feature 
v 10 computes the skewness of the sound unit. If the 
vocalization is regarded as an stochastic process, the 
skewness, a measure of the asymmetry of the 
probability distribution, is computed as described in 
Eq.(3). 


v 10 



VjyZLi O(n)-x) 3 

i}/ N S n S^n)-xyj 


(3) 


Finally, an additional parameter v 14 was added to the 
set. This parameter is inspired by linear prediction 
coding algorithms typically employed in the analysis 
and synthesis of human speech to identify voiced or 
unvoiced sound units [16, 10]. In order to compute this 
parameter, the auto-correlation of the linear prediction 
error (R ee ) and the pitch period ( T P = .2 s) of the 
beluga vocalization have been estimated. The auto- 
correlation is evaluated at the estimated pitch period 
as seen in Eq. (7). This value is normalized by the 
vocalization energy and the factor (1 — T P /N) to 
compensate the triangular decay of the autocorrelation 
estimate. 


Vl 4 1 - Tp/N 


Ree (Tp) 
Ree( 0) 


(?) 


The operator E[-] is the expected value operator and x 
is the arithmetic average. The kurtosis (feature v lx ), a 
measure of the "peakedness" of the probability 
distribution, is obtained by sample averaging as 
described in Eq. (4). 


Hi 



(VnIL lO(n)-x ) 2 ) 2 


(4) 


A couple of parameters that give information about 
the vocalization nonlinearity have also been added in 
the feature set: third-order autocovariance and time 
reversibility test of the vocalization (parameters v 12 
and v 13 of the feature set). 


The so called "third-order autocovariance" [18, 20] is a 
higher-order extension of the traditional auto- 
covariance. This parameter was computed as 
described in [2] (see Eq. (5)). 

v i2 = = E[x(n)x(n - 1 )x(n - 2)] (5) 


The time reversibility parameter (i; 13 ) is a sample 
estimate of the slope skewness normalized by the 
estimate of slope standard deviation to the third 
power (d 3 ) [22]. A statistical process is said to possess 
time-reversibility if statistical properties are identical 


Classification in a Controlled Environment 

The proposed algorithm has been tested with real 
beluga signals recorded in the Oceanografic of 
Valencia. Details are given as follows. 

Beluga whale vocalizations were recorded with a 
sound acquisition system Roland (Edirol) FA-101 (24 
bits and frequency sample f s = 96 KHz), a Bruel&Kjaer 
8103 hydrophone (0.1 Hz-180 KHz) and a Bruel&Kjaer 
2692 Nexus amplifier (0.1 Hz -100 KHz). These audio 
files were joined into a single one containing all the 
different vocalizations. The Oceanografic biologist 
listened and classified each one of the acoustic events 
(echolocation pulses were excluded). The 
aforementioned file contains 560 vocalizations derived 
from two different specimens: the female beluga 
(Yulka) and the male (Cairo). 

The described activity detector has been applied to 
this test file with the following settings. The longest 
communicative sense beluga vocalization is chosen to 
be M = 1.5 sec. and the shortest isN = .1 sec. The 
detection algorithm was set to work with A a = 
2.1 and PFA np = 10 -4 . With these settings, the 
detection percentages and false alarm rate were 
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obtained by means of comparison of the output of the 
proposed detector with the biologist detections. The 
achieved percentages at the output of the detector are 
PD = 98.1 % and PFA = 37.6 %. The slightly higher 
value of the PFA is due to the fact that the algorithm 
settings are fixed, so that no beluga sound unit is 
missed (according to the biologist manual 
classification). The sound units detected were fed into 
the proposed classifier. 

A simple Naive Bayes classification algorithm was 
used to obtain the recognition rate [6]. The training set 
and test set used were proportionate from data base in 
[12]. Both of them have the same vocalization 
proportions roughly. 

The overall correct classification percentage was 88.3%. 
However, this percentage was not equally distributed 
among all categories presented. Table 3 shows the 
recognition rate per category and its confusion matrix. 
As it can be appreciated, the average classification 
percentage for the noise category allows rejecting 
almost every undesired detection produced at the 
detection stage. This fact can be exploited by the 
proposed system to achieve very low beluga sound 
losses, lowering the detection thresholds at the 
detection stage. 


TABLE 3 Confusion matrix. Average recognition rate per 

CATEGORY (GREY) 


Category 

Tonal 

Pulsed 

Jawclap 

Noise 

Tonal 

83.7% 

14.8% 

i.i% 

0.4% 

Pulsed 

16.1% 

79.9% 

2.4% 

1.6% 

Jawclap 

0% 

6.4% 

88.2% 

6.4% 

Noise 

0% 

0% 

3.9% 

96.1% 


Pulsed components 



0 . 5 - 


q| 1 1 1 J 1 1 

0 0.1 0.2 0.3 0.4 0.5 0.6 

Time (seconds) 

FIG. 7 MIXED (PULSED AND TONAL COMPONENTS) BELUGA 
SOUND 

The best achieved classification rate was obtained for 
the jawclap category. However, so high classification 


percentages were not achieved by tonal and pulse 
categories. A possible explanation is due to the fact 
that beluga whales can produce sounds with both 
components (pulsed and tonal) mixed, which can be 
seen in Fig. 7. The proposed algorithm may fail and 
classify this sound unit as the predominant component 
(tonal or pulsed), which may not to match the biologist 
criteria. 

Nonlinearity Measure of Some Beluga 
Sounds 

It has been showed in [23] that nonlinear production 
mechanisms allow individuals to generate highly 
complex and unpredictable vocalizations without 
requiring equivalently complex neural control 
mechanisms. In [4], the presence of nonlinearities was 
observed and measured for humpback whales. In [13], 
qualitative descriptions and quantitative analyses of 
nonlinearities in the vocalizations of killer whales 
(Orcinus orca) and North Atlantic right whales 
(Eubalaenaglacialis) were provided. 

Some of the features proposed in Table 2 can be used 
in a similar way to automatically detect the presence 
of nonlinearities in certain vocalizations of the beluga 
whales. In order to show this, we have represented in 
Fig. 8 the values of the feature |v 13 | (time reversibility 
measure) compared with the feature v 14 (voiced/ 
unvoiced parameter) obtained with a sound register 
containing 313 vocalizations of all the categories 
described in Table 1. As it can be seen, most jawclap as 
well as some tonal and pulsed vocalizations seem to 
be produced by nonlinear mechanisms (non-zero time 
reversibility values). However, a more detailed study 
should be done to assert that high values in the 
nonlinear indicators really come from nonlinearities. 

0.5 r 

ft Tonal 

0.45 [- ▼ Pulsed 

JsvrClap 


04 . 

•4 5 *. 



0.05 1 1 1 1 1 ‘ t 1 1 1 1 
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' v i* ' 

FIG. 8 FEATURE I v 13 I (TIME REVERSIBILITY MEASURE) OF 
DIFFERENT CATEGORIES OVER FEATURE v u (VOICED/ 
UNVOICED PARAMETER) 
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A surrogate test has been done to measure the 
nonlinearity degree. This is a procedure in which a 
given metric is evaluated in the original data and in a 
surrogate version of the data artificially generated to 
resemble the original data. 

The method employed for surrogate generation, the 
Iterative Amplitude Adapted Fourier Transform 
(iAAFT) [21], can be employed for stationary time 
series even when the data does not follow Gaussian 
distribution. The surrogate data generated have both 
amplitude spectrum and signal statistical distribution 
matched to the original data. 

As a consequence of non-stationarity of beluga sound 
units, the proposed surrogate algorithm has to be 
employed only with those sound fragments where 
stationarity holds. Thus, only small fragments of 10ms 
of the tonal category of beluga whale sounds are going 
to be processed using this technique. Otherwise, 
surrogate data generation does not preserve non- 
stationarity and differences in the score values will be 
due to non-stationarity rather than nonlinearity. 
Additionally, using only tonal vocalizations, an 
indication can be acquired for beluga whales of 
irregular periodic vibrations similar to those produced 
in terrestrial mammals [7]. 

The tonal vocalizations employed to perform the 
nonlinearity measure are named: Whistle Creak, Creak 
Whistle and Flat Whistle. In Fig. 9, the spectrogram of 
these sound units can be seen. Inspection of the 
sounds reported evidences of nonlinear behaviour. 
This was done by inspecting their time frequency 
representation to look for frequency jumps and 
subharmonics. The selected features (metrics) for the 
surrogate test are v 12 and v 13 (both features are 
typically employed to detect nonlinearities in time 
series) [16]. The procedure description is as follows: 
firstly a significant set of surrogate series is artificially 
generated with the aforementioned algorithm, then 
statistics sensitive to nonlinearity ( v 12 and v 13 ) are 
determined on both the surrogate and the original 
time series, and finally the values are compared by 
means of a rank test. The position index (rank) of each 
feature with respect to the surrogates is determined. 

The percentage rank (PrL'j?/o]) has been calculated for 
several regions inside a given vocalization (red 
markers in Fig. 9). The results, obtained for 200 
surrogates in each test, can be seen in Table 4. 

The results of Table 4 show that if a short time 
window analysis of tonal vocalizations is employed 




Time (sample) 
Flat Whistla 


Time (sample) 



FIG. 9 TIME-FREQUENCY REPRESENTATIONS OF DIFFERENT 
TONAL BELUGA SOUND UNITS. THE RED MARKERS 
INDICATE THE REGIONS WERE THE NONLINEARITY RANK 
INDEX FOR THE PRESENTED METRICS HAS BEEN COMPUTED. 
THE ARROWS ARE DIRECTED TO OBSERVED 
NONLINEARITIES 


the high values of v 12 and v 13 indicate the presence of 
nonlinearities in the beluga whale vocalizations. These 
results can be used to devise automatic tests to look 
for subharmonics and frequency jumps in beluga 
sounds. The study of nonlinearities in beluga 
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vocalizations could be a key factor in the study of the 
beluga anatomy. For example, the presence of 
nonlinearities in the vocalizations could indicate that 
some of the complex tonal beluga whales sounds may 
be generated by irregular vibration of some sort of 
fold similar to vocal folds in terrestrial mammals [7]. 
However, it is important to highlight that not all 
vocalizations that appear in Fig. 9 and exhibit a high 
v 13 value are due to nonlinear dynamics. This feature 
calculated at the classification analysis for the whole 
vocalization length may indicate non-stationarity as 
well. It will be important then when developing tests 
for nonlinear detection to guarantee the stationarity of 
the whale song fragment or to use an alternative 
surrogata data generation that could be employed in 
non-stationarity signals [1, 19]. 

TABLE 4 Percentage rank (Pr[%]) for different sounds. The 

NUMBER OF SURROGATE DATA EMPLOYED IS 200. THE TABLE REFERSTO 
THE ANALYSIS PERFORMED AT THE POSITIONS OF THE 1ST, 2ND AND 3RD 
RED MARKERS OF FlG. 8 


Sound name 

Rank test 

1st 

2nd 

3rd 


V 12 

30% 

71% 

73% 

Whistle Creak 

V 13 

63% 

36% 

56% 

Creak Whistle 

V 12 

99% 

100% 

44% 

v 13 

89% 

99% 

50% 

Flat Whistle 

*12 

96% 

29% 

94% 

V 13 

49% 

71% 

68% 


Conclusions and Future Work 

An automatic system has been developed for beluga 
sound detection and classification. The proposed 
algorithm has been designed to work in real time with 
moderate computational requirements. 

The detection algorithm is based on an adaptive 
threshold energy detector whereas the classification is 
performed with a Naive Bayes classifier. This work 
demonstrates that feature extraction for classification 
and machine learning is a feasible alternative to be 
applied instead of the typical approaches based on 
pattern recognition of time frequency representations 
of cetaceous sounds. The proposed detection 

algorithm although its simplicity, gives good results in 
simulations and real signals (98% of detection 

percentage). 

The results, according to the classification proposed by 
biologists, are divided into three categories: tonal 
sounds (communicative), pulsed sounds 
(communicative and aggressive) and jawclaps 

(aggressive). Despite the complex nature of some 
beluga sounds (with mixed tonal and pulsed 

components), the classification percentages achieved 


are quite high. The averaged classification percentage 
of the system is close to 88%. 

On the other hand, this work illustrates that feature 
extraction can be a useful tool to design automatic 
detectors of irregularities in beluga whale 
vocalizations. Nonlinearities in beluga whale songs 
can be quantitatively characterized by means of 
comparing nonlinear metrics with surrogate data. This 
may lead to indicators that are capable to 
automatically detect and characterize "nonlinear 
dynamics" in beluga bioacoustics. Frequency jumps 
and subharmonics detections are given as example. 

Although promising results have been obtained, the 
proposed approach has to be tested in open sea and 
for different cetacean species. The authors have started 
to work on these topics as well as many other related 
to characterization and signal modality of cetacean 
sounds. 
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